A Comparative Study of Statistical and Data Mining Algorithms for Prediction Performance
نویسندگان
چکیده
The aim of this study is to perform a comparison experiment between statistical and data mining modelling techniques. These techniques are statistical Logistic Regression, data mining Decision Tree and data mining Neural Network. The comparison will evaluate the performance of these prediction techniques in terms of measuring the overall prediction accuracy percentage agreement for each technique. The ratio of the binary values of the dependent variable in the training dataset and the population is used on the three techniques to find the effect of this ratio on the prediction performance. For a given data set, the results shows that the performance of the three techniques is comparable in general with small outperformance for the Neural Network. An affecting factor that makes the prediction accuracy varied is the dependent variable values distribution (distribution of “0”s and “1”s). It is seen that, for all of the three techniques, the overall prediction accuracy percentage agreement is high when the ratio of “0”s and “1”s is 3:1, whereas for the ratios 2:1 and 1:1 the performance is lower. KeywordsData Mining, Classification, Prediction Model, Statistical Logistic Regression, Neural Network,
منابع مشابه
Personal Credit Score Prediction using Data Mining Algorithms (Case Study: Bank Customers)
Knowledge and information extraction from data is an age-old concept in scientific studies. In industrial decision-making processes, the application of this concept gives rise to data-mining opportunities. Personal credit scoring is an ever-vital tool for banking systems in order to manage and minimize the inherent risks of the financial sector, thus, the design and improvement of credit scorin...
متن کاملEvaluation of Data Mining Algorithms for Detection of Liver Disease
Background and Aim: The liver, as one of the largest internal organs in the body, is responsible for many vital functions including purifying and purifying blood, regulating the body's hormones, preserving glucose, and the body. Therefore, disruptions in the functioning of these problems will sometimes be irreparable. Early prediction of these diseases will help their early and effective treatm...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کاملAccuracy Improvement of Mood Disorders Prediction using a Combination of Data Mining and Meta-Heuristic Algorithms
Introduction: Since the delay or mistake in the diagnosis of mood disorders due to the similarity of their symptoms hinders effective treatment, this study aimed to accurately diagnose mood disorders including psychosis, autism, personality disorder, bipolar, depression, and schizophrenia, through modeling and analyzing patients' data. Method: Data collected in this applied developmental resear...
متن کاملComparison of Four Data Mining Algorithms for Predicting Colorectal Cancer Risk
Background and Objective: Colorectal cancer (CRC) is one of the most prevalent malignancies in the world. The early detection of CRC is not only a simple process, but it is also the key to its treatment. Given that data mining algorithms could be potentially useful in cancer prognosis, diagnosis, and treatment, the main focus of this study is to measure the performance of some data mining class...
متن کامل